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The nonstructural protein 1 (nspl) of the severe acute respiratory syndrome coronavirus has 179 residues 
and is the N-terminal cleavage product of the viral replicase polyprotein that mediates RNA replication and 
processing. The specific function of nspl is not known. Here we report the nuclear magnetic resonance 
structure of the nspl segment from residue 13 to 128, which represents a novel a/p-fold formed by a mixed 
parallel/antiparallel six-stranded P-barrel, an a-helix covering one opening of the barrel, and a 3 10 -helix 
alongside the barrel. We further characterized the full-length 179-residue protein and show that the polypep¬ 
tide segments of residues 1 to 12 and 129 to 179 are flexibly disordered. The structure is analyzed in a search 
for possible correlations with the recently reported activity of nspl in the degradation of mRNA. 


After the major outbreak of severe acute respiratory syn¬ 
drome (SARS) in the beginning of 2003, the SARS coronavirus 
(SARS-CoV) became a major topic of coronavirus research. 
The coronavirus genome is composed of a single plus-strand 
RNA of about 30 kb, which is the largest nonsegmented ge¬ 
nome among known RNA viruses. About two-thirds of the 
coronavirus genome is devoted to encoding the replicase that 
mediates viral RNA synthesis (64). The replicase gene com¬ 
prises two large open reading frames (ORFs) located at the 5' 
end of the genome. The first one, ORFla, encodes a polypro¬ 
tein of 450 to 500 kDa (polyprotein la), and the second one, 
ORFlb, is translated together with ORFla after a -1 ribo- 
somal frameshift, leading to the expression of the “polyproteh 1 
lab,” which has a size of 750 to 800 kDa. The replicase polypro¬ 
tein is processed by ORFla-encoded viral proteinases, which 
leads to about 16 nonstructural proteins (nsp), which are num¬ 
bered consecutively from the N terminus to the C terminus of 
the polyprotein (14, 53). 

The large number of mature proteins produced from the 
polyprotein indicates a high level of complexity of the viral 
replication process. Some of the enzymatic activities that were 
detected or predicted in SARS-CoV to date include the main 
protease (nsp5), a papain-like proteinase (nsp3d; PLpro), an 
RNA-dependent RNA polymerase (nspl2), an RNA helicase 
(nspl3), an endoribonuclease (nspl5), an ADP-ribose-T-phos- 
phatase (nsp3b), a deubiquitinase (nsp3d), a 3'—>5' exoribo- 
nuclease (nspl4), and a ribose-2'-0-methyltransferase (nspl6) 
(3, 5, 18, 21, 22, 44, 59, 70). For the two proteases, the endori¬ 
bonuclease, and the ADP-ribose-l"-phosphatase, three-dimen¬ 
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sional structures have been solved, which together with bio¬ 
chemical data have revealed some aspects of the enzyme 
mechanisms (27, 54, 56, 58, 63, 67; reviewed in references 4, 35, 
and 65). The physiological functions of several of the other 
nonstructural proteins remain to be determined, but high-res¬ 
olution structure determinations have allowed the identifica¬ 
tion of possible functional sites and provided the basis for 
further biochemical studies (15, 30, 52, 61, 62, 68). Other 
replicase proteins still remain to be characterized. 

Nspl is the N-terminal cleavage product of the replicase 
polyprotein and is produced by the action of PLpro. It is 
among the least well-understood nsps, and other than in coro- 
naviruses, no viral or cellular homologs are known. Levels of 
sequence conservation among the different coronaviruses are 
highest at the 3' end of the genome, and the sequences are very 
divergent at the 5' end, especially in nspl to nsp3, which are 
products of PLpro cleavage, nspl has been proposed to be 
useful as a group-specific marker (59). In the group 1 corona- 
viruses, nspl (also known as p9) is a protein of about 110 
residues, with 20 to 50% sequence identity among all group 1 
Co Vs. The viruses of subgroup 2a, such as murine hepatitis 
virus (MHV) and human coronavirus OC43, encode an nspl 
protein of about 245 residues, also known as p28, while the 
group 3 viruses (avian) do not encode an nspl. The nspl of 
SARS-CoV, which has been classified as the only member to 
date of the subgroup 2b (19, 20, 59), comprises 180 residues, 
with a molecular mass of 20 kDa. nspl sequences are divergent 
between groups 2a and 2b, and no sequence similarity between 
SARS-CoV nspl and group 2a nspl proteins could be identi¬ 
fied using standard searching tools such as BLAST. 

Biochemical experiments demonstrated interactions be¬ 
tween MHV nspl and two other replication proteins (nsp7 and 
nsplO) and colocalization with nonstructural proteins and the 
nucleocapsid protein at viral replication complexes in the cy¬ 
toplasm during the early stages of infection (6). In contrast, 
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during the later stages of infection, MHV nspl was found to 
colocalize with structural proteins at virion assembly sites (6). 
Mutations at the nspl/nsp2 cleavage site of MHV that pre¬ 
vented the cleavage of nspl from the polyprotein caused 
slower growth and reduced RNA synthesis relative to wild-type 
viruses (13). Deletion of the nspl-coding region in infectious 
clones of MHV yielded viruses that were unable to produc¬ 
tively infect cultured cells (7). Furthermore, exogenous expres¬ 
sion of MHV nspl in mammalian cells arrested the cell cycle in 
the Go/Gi phase and inhibited cell proliferation (8). A point 
mutation in the proteolytic cleavage site between nspl and 
nsp2 in the full-length genome, and in minigenomes of the 
group 1 CoV porcine transmissible gastroenteritis virus, blocked 
the release of nspl from the nascent polyprotein and caused 
a dramatic reduction in virus viability (17). SARS-CoV nspl 
was shown to specifically accelerate the degradation of mRNA 
and thus lead to a reduction in cellular protein synthesis, 
which may provide a survival advantage for the virus (31). 
Overall, these observations indicate that nspl might partic¬ 
ipate in multiple stages of the coronavirus life cycle, and 
they implicate this protein as a potentially important viru¬ 
lence factor. 

This paper describes the nuclear magnetic resonance (NMR) 
structure of SARS-CoV nspl. Before this study, no infor¬ 
mation about the three-dimensional structure of nspl was 
available, and SARS-CoV nspl does not have significant 
amino acid sequence similarity with any protein with known 
three-dimensional structure. SARS-CoV nspl was therefore 
selected for NMR structure determination by the Consor¬ 
tium for Functional and Structural Proteomics of SARS- 
CoV-Related Proteins (http://sars.scripps.edu). The avail¬ 
ability of a high-resolution solution structure will help to 
guide further investigations of the biochemical and physio¬ 
logical functions of nspl. 

MATERIALS AND METHODS 

Target optimization strategy. Six constructs truncated at different positions 
were created based on secondary structure prediction, using the amino acid 
sequence as input for the software Jpred (12). The truncated nspl variants were 
cloned in an Escherichia coli expression plasmid derived from pET-28 under the 
control of the T7 promoter and in frame with 5' coding sequences for a His 6 tag 
and followed by a spacer sequence which ends with ENLYFQG. This strategy 
allows the preparation of proteins that have only one extra glycine at the N 
terminus after proteolysis with tobacco etch virus protease. This expression 
strategy was selected after tests of different E. coli strains and different temper¬ 
atures during the induction in 10-ml cultures. The samples were expressed in a 
microexpression device (M. S. Almeida, M. Geralt, R. Horst, and K. Wiithrich, 
unpublished), purified using Ni 2+ affinity chromatography, concentrated with 
ultrafiltration centrifugal devices, and subjected to one-dimensional (ID) *H 
NMR screening using a Bruker DRX700 spectrometer with a 1-mm TXI HCN 
z-gradient microprobe. Based on the high-quality ID *H NMR spectrum, the 
construct consisting of nspl residues 13 to 128 [nspl(13-128)] was selected for a 
NMR structure determination. In an attempt to further improve this sample, 
variant constructs of nspl(13-128) with Cys 52 replaced by Ala, Ser, Arg, or Asp 
were prepared by site-directed mutagenesis using the QuikChange kit (Strat- 
agene) according to the manufacturer’s instructions. The variants were evaluated 
by ID X H NMR screening for a globular fold and by circular dichroism spec¬ 
troscopy to determine their stability. 

Protein preparation. Large-scale expression of uniformly 15 N-labeled or 13 C- 
and 15 N-labeled nspl(13-128) in E. coli BL21(DE3) cells was carried out at 18°C 
in 500 ml of M9 minimal medium containing either 0.5 g 15 NH 4 C1 or 0.5 g 
15 NH 4 C1 and 2 g [ 13 C 6 ]-D-glucose as the sole nitrogen and carbon sources, 
respectively. For the protein purification, the cells were disrupted by sonication 
in the presence of 25 mM HEPES at pH 8.0, 250 mM NaCl, 2 mM dithiothreitol, 


0.03% NaN 3 , and EDTA-free Complete protease inhibitor tablets (Roche). The 
cell lysate was loaded onto a 10-ml HisTrap FF column equilibrated with 50 mM 
imidazole in the same buffer system as mentioned above. The retained proteins 
were eluted with a 50 to 500 mM imidazole gradient and incubated with recom¬ 
binant tobacco etch virus protease at 22°C for 2 days. The resulting solution was 
loaded onto a 300-ml Superdex 75 column equilibrated with 25 mM sodium 
phosphate at pH 7.0, 250 mM NaCl, and 0.03% NaN 3 . The protein eluted with 
a retention volume equivalent to about 13 kDa. The solution was concentrated 
with ultrafiltration centrifugal devices and supplemented with 10% D 2 0 to a final 
sample volume of about 300 pi. 

NMR spectroscopy and structure calculation. The NMR samples contained 2 
mM of nspl(13-128). NMR spectra were collected at 298 K with Bruker Avance 
600-MHz and Avance 800-MHz spectrometers equipped with TXI HCN z- 
gradient probes. The sequence-specific resonance assignment (66) has been 
described elsewhere (1). The input for the structure calculation consisted of the 
chemical shift list obtained from the resonance assignment, a 3D 15 N-resolved 
1 H, 1 H nuclear Overhauser effect spectroscopy (NOESY) spectrum, and two 3D 
13 C-resolved 1 H, 1 H NOESY spectra optimized for the aliphatic and aromatic 
13 C regions. The nuclear Overhauser effect (NOE) data were measured at 800 
MHz with a mixing time of 60 ms. For the peak picking of the NOESY spectra, 
NOE assignment, and structure calculation, the stand-alone ATNOS/CANDID 
program (24, 25) was used in conjunction with the CYANA torsion angle dy¬ 
namics algorithm (23). The standard protocol with seven cycles of peak picking, 
NOE assignment, and 3D structure calculation with simulated annealing in 
torsion angle space (24, 25) was applied. Backbone (p and ij> dihedral angle 
constraints derived from the C“ chemical shifts (40, 60) were used as supple¬ 
mentary data in the structure calculation. The 20 conformers with the lowest 
residual CYANA target function values obtained from cycle 7 of the ATNOS/ 
CANDID/CYANA calculation were energy minimized in a water shell with the 
program OPALp (34, 39), using the AMBER force field (9). The program 
MOLMOL (33) was used to analyze the protein structure and to prepare the 
figures showing the NMR structures. Analysis of the stereochemical quality of 
the models was accomplished using the Joint Center for Structural Genomics 
validation central suite (http://www.jcsg.org) and the Protein Data Bank valida¬ 
tion server (http://deposit.pdb.org/validate). 

Steady-state 15 N{ 1 H} NOEs were measured with transverse relaxation-opti¬ 
mized spectroscopy (TROSY)-based experiments (55, 69) on a Bruker Avance 
600-MHz spectrometer, using a saturation period of 3 s and an interscan 
delay of 5 s. 

Accession numbers. The chemical shifts have been deposited in the Bio- 
MagResBank (http://www.bmrb.wisc.edu) under accession number 7014. The 
atomic coordinates of the bundle of 20 conformers used to represent the nspl 
structure have been deposited in the Protein Data Bank (http://www.rcsb.org 
/pdb) with the code 2GDT, and those of the conformer closest to the mean 
coordinates have the code 2HSX. 


RESULTS AND DISCUSSION 

The 179-residue nspl of SARS-CoV was included in a high- 
throughput proteomics characterization strategy by the consor¬ 
tium “Functional and Structural Proteomics of the SARS- 
CoV” (unpublished). The full-length protein and the fragment 
consisting of residues 1 to 159 were cloned, expressed and 
purified by the Protein Production Core, and given to us for 
NMR screening (50). The ID 1 11 NMR spectra of both con¬ 
structs showed characteristics of a globular fold as well as of 
disordered regions (data not shown). Based on these results, 
the protein was transferred to us for an NMR structure deter¬ 
mination. 

Since nspl has no identifiable sequence similarity with pro¬ 
teins with known three-dimensional structures, it was not pos¬ 
sible to predict the domain structure of this protein based on 
sequence comparisons. However, the presence of flexibly dis¬ 
ordered regions in the protein identified by 1 H NMR spectros¬ 
copy (see “Characterization of the full-length SARS-CoV 
nspl” below) was consistent with the results of secondary struc¬ 
ture prediction, which indicated that a few residues at the N 
terminus, as well as a greater number of residues in the C- 
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TABLE 1. Summary of the recombinant production of nspl 
variants in BL21(DE3) E. coli cells 


Name 

Molecular 

mass" 

(kDa) 

Soluble expression 6 
at: 

37°C 27°C 18°C 

Protein 
recovery 
(mg y 

Expression 

level 

(U.MF 

nspl 

19.5 

-/+ 

-/+ 

+ 

3 

15 

nspl(13-179) 

18.3 

-/+ 

-/+ 

+ + 

2 

11 

nspl(1-149) 

16.1 

-/+ 

-/+ 

+ + 

3 

19 

nspl(13-149) 

15.0 

-/+ 

-/+ 

+ + 

2 

13 

nspl(1-128) 

13.8 

-/+ 

+ 

+ + 

4 

29 

nspl(13-128) 

12.7 

-/+ 

+ 

+ + 

3 

24 


" Without the expression tag of 3,500 Da (see text). 

b The soluble expression levels were evaluated by denaturing polyacrylamide 
gel electrophoresis of the total cell lysate and of the soluble fraction thereof. 
—/+, less than 25% soluble protein; +, about 50% soluble protein; ++, more 
than 85% soluble protein. 

c The value is for purified protein in a 10-ml volume of buffer. The proteins 
were purified from cells grown at 18°C in 10 ml of culture. 


terminal one-third of the protein, would not adopt regular 
secondary structure. To investigate the boundaries of the glob¬ 
ular domain and to optimize conditions for protein expression, 
sample preparation, and NMR structure determination, we 
designed a set of truncated variants of nspl, bearing in mind 
the results of secondary structure predictions for this protein. 
The variant constructs have molecular masses of 12.7 kDa to 
19.5 kDa, not including the N-terminal tag of 3.5 kDa (Table 
1). These constructs were used to transform E. coli strains 
Rosetta(DE3), BL21(DE3) RIL, and BL21(DE3), and the re¬ 
combinant proteins were expressed in a microshaker at 37°C, 
27°C, and 18°C. The best growth rates and expression levels 
were obtained with the strain BL21(DE3). Table 1 provides a 
survey of the expression results with six different nspl con¬ 
structs. Most of the protein in the samples expressed at 37°C 
was insoluble for all six variants. For two constructs, higher 
yields of soluble protein were obtained at 27°C, but the best 
results were achieved with expression at 18°C, where most of 
the expressed protein was in the soluble fraction. 

Based on the results in Table 1, BL21(DE3) E. coli cells in 
cultures at 18°C were used for the protein production. The 
proteins were purified by Ni 2+ affinity chromatography and gel 
filtration chromatography. The final protein recovery in 10 ml 
of culture was in the range of 2 to 4 mg, which represents 
expression levels in the range of 11 to 29 |xM (Table 1). Sam¬ 
ples were concentrated for ID 1 11 NMR screening with a 
microcoil probe (Almeida et al., unpublished). The two short¬ 
est constructs, nspl( 1-128) and nspl( 13-128), exhibited the 
highest expression levels. The nspl(13-128) construct was se¬ 
lected for the structure determination, based on its high-quality 
1 H NMR spectrum and on its greater stability in comparison to 
the other five constructs of Table 1. 

Since cysteine residues are susceptible to oxidation and for¬ 
mation of intermolecular disulfide bonds, which can lead to 
unstable and heterogeneous protein samples, we also investi¬ 
gated variant constructs of nspl( 13-128) with Cys 52 replaced 
by Ala, Ser, Arg, or Asp as part of our initial target optimiza¬ 
tion strategy, using ID 1 H NMR and circular dichroism spec¬ 
troscopy to evaluate their foldedness and stability. The variants 
with Ser 52, Asp 52, or Arg 52 were thus found to be unstable. 
The variant with Cys 52 replaced by Ala led to a stable, folded 


TABLE 2. Input for the structure calculation and characterization 
of the bundle of 20 energy-minimized CYANA conformers 
representing the NMR structure of nspl(13-128) 


Parameter 

Value" 

NOE upper distance limits 


(intraresidual, short-range, 


medium-range, long-range). 

.2,659 (566, 692, 393,1008) 

Dihedral angle constraints. 

100 

Residual target function (A 2 ). 

1.96 + 0.35 

Residual NOE violations 


No. >0.lA. 

28 + 5 

Maximum (A). 

0.19 + 0.16 

Residual dihedral angle violations 


Number >2.5°. 

1 + 1 

Maximum (°). 

4.66 + 1.27 

Amber energies (kcal/mol) 


Total. 

. -3,966.42 + 91.80 

van der Waals. 

-331.44 + 14.88 

Electrostatic. 

. -4,633.80 + 89.58 

RMSD 6 from ideal geometry 


Bond lengths (A) . 

0.0075 + 0.0002 

Bond angles (°). 

2.039 + 0.047 

RMSD to the mean coordinates (A) c 


bb (14-74, 85-125). 

0.45 + 0.06 

ha (14-74, 85-125). 

0.91 + 0.07 

Ramachandran plot statistics (%) d 


Most favored regions. 

73 

Additional allowed regions. 

22 

Generously allowed regions. 

3 

Disallowed regions. 

2 


a Except for the NOE upper distance limits, dihedral angle constraints, and 
Ramachandran plot statistics, the average values for the 20 energy-minimized 
conformers with the lowest residual CYANA target function values and the 
standard deviations among them are listed. 
b RMSD. root mean square deviation. 

c bb, backbone atoms N, C a , and C'; ha, all heavy atoms. The numbers in 
parentheses indicate the residues for which the RMSD was calculated. 
d As determined by PROCHECK (45). 


protein. However, after observing excellent sample stability of the 
wild-type protein despite the single Cys residue, we chose the 
wild-type protein for the structure determination. 

The parameters in Table 2 show that a well-defined NMR 
structure of nspl(13-128) was obtained. Above-average local 
disorder is limited to the C-terminal heptapeptide segment of 
residues 122 to 128 and to a disordered loop of residues 77 to 
86 (Fig. la). The structure of intact nspl includes a globular 
domain of residues 13 to 121 and the disordered regions of 
residues 1 to 12 and 122 to 179 (Fig. lb). 

The structure of nspl (13-128) represents a new fold. The 
sequential arrangement of the regular secondary structures in 
the globular domain of nspl is pi-al-p2-3 10 -p3-p4-p5-p6. 
There is a mixed parallel/antiparallel six-stranded p-barrel, 
where the spatial arrangement of the p-strands is pi-p2-p5- 
P3-P4-P6, and pi makes contact with p6 (Fig. 2 and 3). The 
P-strands consist of residues 15 to 21, 52 to 56, 69 to 73, 87 to 
92, 104 to 110, and 117 to 124. The helix al with residues 36 to 
49 is located across one barrel opening, and the 3 10 -helix of 
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FIG. 1. (a) Bundle of 20 energy-minimized CYANA conformers of nspl(13-128). In this stereo view, the polypeptide backbone is shown as a 
gray spline function through the C“ positions. Selected sequence positions are identified by numerals, (b) Ribbon representation of the closest 
conformer to the mean coordinates of the bundle of 20 conformers used to represent the NMR structure. The fl-strands are cyan, the helices are 
red, and polypeptide segments with nonregular secondary structure are gray. The regular secondary structures are further identified by lettering. 
The polypeptide segments shown in green represent the additional, structurally disordered polypeptide segments of the full-length nspl. 


residues 62 to 64 is positioned alongside the barrel. A search of 
the Protein Data Bank using the structure of nspl as input for 
the DALI server (26) did not indicate statistically significant 
structural similarity to any other protein described to date. 

For the continued discussion it is helpful to adopt a system¬ 
atic analysis of (3-barrels, using the number of strands (n); the 
shear number (5), which measures the stagger of the strands; 
and the tilt angle (a) of each strand, which is the angle between 
the barrel axis and the line adjusted for best fit to the N, C“, 
and C' atoms of each strand (43, 46, 47). S must be an even 
integer because of the hydrogen bonding pattern between the 
|3-strands (46). Using standard values for the mean C“-C“ 
distance along the strands (a = 3.3 A) and between the strands 
(b = 4.4 A), the following geometric relations characteristic of 
[3-barrel structures in proteins have been proposed (43): 

tan a = Salnb (1) 

R = [(Sa) 2 + (nb) z ] m /[2n sin (tt//z)] (2) 

R is the barrel radius, which is defined as the average of the 
distances between the C“ atoms of the three residues in op¬ 


posite strands that are closest to the central part of the barrel 
(47). 

The (3-barrel in nspl contains six strands and has a shear 
number ( S ) of 10. The measured tilt of the strands to the barrel 
axis (a) ranges from 38° ((32) to 78° ((34), with an average value 
of 60°. The wide variation among the tilt angles of the indi¬ 
vidual strands reflects that the (3-barrel of nspl is pronouncedly 
irregular (Fig. 2a and b). 

The residues used to calculate the radius of the nspl barrel 
are 18 to 20, 53 to 55, 71 to 73, 86 to 88, 106 to 108, and 122 
to 124, which give a mean barrel radius ( R ) of 7 ± 1 A. Overall, 
we thus have for nspl that the theoretical tilt angle value of 51° 
calculated from equation 1 shows a discrepancy with the ob¬ 
served average of 60°, whereas the theoretical value of the 
mean barrel radius of 7 A, as calculated from equation 2, is in 
close agreement with the observed value of 7 ± 1 A. 

The interior of the nspl barrel and the interfaces between 
the two helices and the barrel surface consist primarily of 
hydrophobic residues. The arrangement of the side chains in¬ 
side the barrel is highly compact, as expected for a barrel of six 
strands, but the inspection of space-filling models suggests that 
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FIG. 2. Two stereo views of the globular domain of nspl. (a) Ribbon presentation of the closest conformer of nspl to the mean coordinates 
of the bundle in Fig. la, shown in the same orientation as in Fig. la. The organization of the (3-strands in the barrel is indicated by the labels, (b) 
Same as panel a after rotation about a horizontal axis, so that one looks at one side of the |3-barrel; the axes of the (3-barrel and the helix al are 
nearly perpendicular to each other, and those of the barrel and the 3 10 -helix are nearly parallel to each other. 



FIG. 3. Two topology diagrams of the nspl mixed parallel/antipa¬ 
rallel six-stranded (3-barrel (see text). The numbering indicates the first 
and last residues of each (3-strand. 


there is a tight cavity along the center of the barrel, with a 
radius of about 1.2 A (not shown). The inside of the barrel 
consists of 17 side chains, which are contributed by all six 
strands and which are arranged in three layers. One layer 
contains L105 and the four hydrophilic residues E56, R74, 
K85, and R120. The four peripheral hydrophilic groups medi¬ 
ate the contacts with the solvent at the barrel opening opposite 
to helix al (Fig. 4) (in the orientation of Fig. 2b, these residues 
would be at the bottom of the structure). The charged groups 
of the side chains of these residues are fully solvent exposed, 
and E56 makes a salt bridge with R120. The side chain of V21 
in the second layer and the (3CH 2 -yCH 2 fragment of R120 are 
located between this first layer and the other side chains of the 
second layer, which is in the narrowest portion of the barrel 
and includes the all-hydrophobic side chains of residues L54, 
172, V87, and V122. A third layer consists of the side chains of 
residues L17, L19, V70, L89, A91, L108, and L124, which make 
hydrophobic contacts with the side chains of residues V36, 
A39, L40, A43, and L47 from the amphipathic helix al, and 
the side chains of residues C52, F32, P110, and P68. The 
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FIG. 4. Stereo view of nspl(13-128) in the same orientation as in Fig. 2b. The side chains in the interior of the barrel are differently colored 
to visualize their arrangement in three layers, as discussed in the text. The polypeptide backbone is shown as a gray spline function through the 
C“ positions. Amino acid side chains are shown as stick drawings. Color code: red, residues of layer 1 at the barrel opening opposite to helix al, 
where the four hydrophilic residues are in solvent contact; green, residues in the central layer 2; blue, residues of the third layer, which make 
hydrophobic contacts to the residues shown in magenta at the top, where V36, A39, L40, A43, and L47 originate from the amphipathic helix al. 


second and third layers of (3-strand side chains thus combine 
with the inner side of the helix al to form a large hydrophobic 
core (Fig. 4). It is worth noting that the variant proteins with 
Cys 52 replaced by Ser, Asp, or Arg were unstable, which is 
consistent with a disruption of the (3-barrel core, as one would 
predict from the NMR structure. 

In the SCOP database (48), there are 13 folds with (3-barrels 
of n = 6 and S = 10. Nine of these folds contain antiparallel 
|3-barrels, i.e., the QueA-like fold (41), the hypothetical pro¬ 
tein HI1480 (37), the ferredoxin reductase-like fold (10), the 
phage tail protein (42), the flavin mononucleotide-binding split 
barrel (36), the reductase/isomerase/elongation factor com¬ 
mon domain (10), the elongation factor/aminomethyltrans- 
ferase common domain (32), the core binding factor (3 (28), 
and the surface presentation of antigens fold (16). Four folds 
have mixed parallel/antiparallel (3-barrels similar to the nspl 
fold, but none has the topology observed in nspl (Fig. 3). 
These include the ribosomal protein L25-like (57), the (3 and 
(3' subunits of DNA-dependent RNA polymerase (11), the 
double-i|i (3-barrel (38), and the acid protease fold (51). In 
addition to the apparently unique (3-strand topology and the 
irregular (3-barrel geometry, another interesting feature of 
the nspl fold is that the polypeptide chains connecting the 
(3-strands run along the side of the barrel, except for the loop 
between (33 and (34 (Fig. 2 and 3). This is a rare feature for 
barrels with n — 6 and S = 10, and besides nspl, it has been 
observed only between two (3-strands in the ribosomal protein 
L25-like fold. 

It is intriguing that none of the aforementioned folds are 
quite as irregular as that of nspl. The distortion of the nspl 
structure seems to be related to the polypeptide segments 
connecting the (3-strands across the side of the barrel. Inter¬ 
estingly, although the adjoining ends of strands (35 and (36 are 
the furthest apart in space of all strand combinations in nspl 
(approximately 15 A between PI 10 and 1117), they are con¬ 
nected by the shortest polypeptide segment across the side of 
the barrel (Fig. 2 and 3). This imposes a lower limit on the 
shear between these strands. The shear number of 10 seems to 
be the result of a balance between tight hydrophobic packing 


inside the nspl barrel, which is favored by lower shear num¬ 
bers, and unstrained arrangement of the linker polypeptide 
segments on the outside the barrel, which is favored by larger 
shear numbers. We discuss the (3-barrel topology in much 
detail in order to advance the hypothesis that the outstanding 
irregularity of the nspl (3-barrel might be related to a so-far- 
unknown, possibly entirely novel physiological function of 
nspl. 

The arrangement of the linker polypeptide segments on the 
outside of the barrel is puzzling also with regard to the folding 
pathway of nspl. For example, if the strand (31 formed hydro¬ 
gen bonds with (32 early during translation, this would also fix 
the first linker across the barrel, which might limit the ease 
with which (36 could make hydrogen bonds with (34 and (31. 
Schemes representing the topology of the |3-barrel (Fig. 3) 
would intuitively suggest that folding starts midway during 
translation with the formation of a (3-hairpin of the strands (33 
and (34. In the folded protein, this pair of (3-strands forms the 
least distorted part of the |3-barrel, with highly regular hydro¬ 
gen bonds, and the loop between (33 and (34 is the only one that 
does not run along the barrel surface. In subsequent folding 
steps the two-stranded sheets of (32 and (35 and of (31 and (36, 
respectively, might be formed, which also have quite regular 
hydrogen bonding in the nspl structure. The linkers between 
(32 and (33 and between (34 and (35 have almost the same 
lengths, which should support to position (35 close to (32 if (34 
is arranged close to (33. The three regular two-stranded 
|3-sheets (Fig. 3a) are connected in the barrel by the formation 
of irregular hydrogen bonding patterns. 

Characterization of the full-length SARS-CoV nspl. The 
full-length nspl was characterized by comparison of the num¬ 
bers of backbone 15 N- 1 H correlation peaks and the H N and 
15 N chemical shifts with those of nspl( 13-128) and by hetero- 
nuclear NOE measurements of the truncated and full-length 
nspl. The truncated construct nspl(13-128) has an NMR spec¬ 
trum with large 1 H and 15 N chemical shift dispersion (Fig. 5a), 
which is typical for a well-folded globular domain, where the 
atoms of different individual amino acid residues experience 
different local microsusceptibilities due to the nonperiodic na- 
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FIG. 5. (a) 2D l5 N,'H heteronuclear single-quantum coherence (HSQC) spectrum of nspl(13-128). (b) 2D l5 N,'H HSQC spectrum of 
full-length nspl(l-179). (c) 2D TROSY-based l5 N{ 1 H} NOE experiment with full-length nspl(l-179), with negative peaks shown in red. The 
spectra were recorded at a 1 H frequency of 600 MHz at 298 K. 


ture of the interiors of globular proteins. The spectrum of the 
full-length protein, nspl(l-179) (Fig. 5b), contains a set of 
peaks that overlays very closely with those of nspl(13-128), 
showing that the globular domain is contained in both con¬ 
structs. All the additional peaks have H N chemical shifts of 7.9 
to 8.5 ppm, which is the region characteristic of “random-coil” 
polypeptide chains (66). 

A direct measure of intramolecular mobility is provided by 
the heteronuclear 2D 15 N{ 1 H} NOE experiment, which is rou¬ 
tinely used to access protein dynamics on the picosecond to 
nanosecond timescale (55, 69). Positive signals with intensities 
of —0.8 identify residues in the folded cores of small and 


medium-size globular proteins, with mobility of the individual 
15 N- 1 H moieties restricted to the overall rotational tumbling of 
the molecule. This is illustrated with the ^N-pH} NOE data 
for nspl( 13-128) (Fig. 6), which also serve as a reference for 
assessing the state of the additional chain segments in nspl(l- 
179). 15 N{ 1 H} NOE values of about 0.8 are seen for most of 
the residues in the regular secondary structure elements (Fig. 
6). Increased flexibility of the polypeptide chain that causes 
reduced NOE intensities is found in the disordered loop be¬ 
tween residues 75 and 87 and in the region of residues 94 to 
103, which forms nonregular secondary structure with one 
7 -turn of residues 97 to 99 and a type II (3-turn of residues 98 
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FIG. 6. Plot of the 15 N{ 1 FI} NOE intensities versus the sequence of 
nspl(13-128). The data were collected at a 1 FI frequency of 600 MHz 
at 298 K. The positions of the regular secondary structure elements are 
indicated. Each point represents the mean of three measurements, and 
the error bars represent the standard deviations of the three measure¬ 
ments. 


to 101 (Fig. 1). Most of the resonances in full-length nspl that 
are not present in nspl(13-128) have either small positive or 
negative 15 N{ 1 H} NOEs (Fig. 5c), showing that the polypep¬ 
tide segments of residues 1 to 12 and 129 to 179 are best 
described as a short N-terminal and a long C-terminal flexibly 
disordered tail, respectively (Fig. lb). Interestingly, it has been 
determined that the carboxy-terminal half of the related pro¬ 
tein MHV nspl is not needed for viral replication in culture 
but is important for efficient proteolytic cleavage between nspl 
and nsp2 and for optimal viral replication (7). 

Structure-based search for nspl functions. In an initial at¬ 
tempt to identify leads to possible nspl functions, Fig. 7a 
identifies the solvent-exposed residues of nspl, which would be 
sterically accessible for intermolecular contacts with reaction 
partners. These residues give rise to an uneven electrostatic 
surface charge distribution (Fig. 7b), with a large negative 
surface on one face and hydrophobic, polar, and positively 
charged residues forming the opposite surface. The large con¬ 
tiguous patches of positive and negative surface charge could 
mediate specific as well as nonspecific intermolecular interac¬ 
tions by electrostatic forces. For example, considering that it 
has been shown that nspl promotes mRNA degradation (31), 
the area of positive charge on the molecular surface formed by 
K48, R125, and K126 (Fig. 7b) is of interest as a potential site 
for a direct interaction with mRNA. Alternatively, the posi¬ 
tively and negatively charged areas of the protein surface might 
be involved in protein-protein interactions, and nspl might 
then exert its biological effect not by direct interactions with 
the mRNA but by interacting with other proteins involved in 
the regulation of cellular mRNA stability (49). In this context 
it seems worth mentioning that the NMR structure determi¬ 
nation of nspl(13-128) was performed in 250 mM NaCl be¬ 
cause the protein precipitated at lower ionic strengths, which 
might be due to self-aggregation of nspl caused by the uneven 
charge distribution. 

For comparisons with the nspl proteins of other coronavi- 
ruses, data are available for the p9 proteins of group 1 CoVs 
and for the p28 proteins of group 2a CoVs. Using database 
searches such as BLAST or PSI-BLAST (2), no significant 
sequence similarity between nspl of SARS-CoV and proteins 


of group 2a coronaviruses could be identified. However, pair¬ 
wise alignment with the FFAS server (29), employing a profile/ 
profile-based method that is able to detect distant relation¬ 
ships, identified 20% sequence identity between SARS-CoV 
nspl and MHV nspl (p28) over 174 aligned residues (Fig. 7c). 
The FFAS score of —9.6 indicates that these two proteins 
might share the novel nspl three-dimensional fold, but the 
remaining sequence divergence leaves open the possibility that 
these proteins might perform different functions even if they 
had a common fold. To our knowledge, it has not yet been 
determined whether the p28 proteins of group 2a coronavi¬ 
ruses also promote mRNA degradation, but MHV p28 expres¬ 
sion has been shown to cause cell cycle arrest in cultured cells. 

The most striking result of the alignment of SARS-CoV 
nspl with the polypeptide fragment consisting of residues 46 to 
247 of MHV p28 is the observation of a consensus sequence, 
LRKxGxKG, positioned at the end of strand (36 of the globular 
domain of SARS-CoV nspl, which is conserved not only in 
MHV p28 (Fig. 7c) but also in human CoV OC43 p28. It 
includes the two residues R125 and K126, which contribute to 
the positively charged patch on the nspl molecular surface 
(Fig. 7b). If future studies of the p28 proteins of group 2a CoVs 
should show that these proteins share mRNA degradation 
activity with SARS-CoV nspl, this conserved region could be a 
candidate for mRNA interaction. 

Analysis of the nspl structure also provides indications for 
functional differences between the p28 proteins and SARS- 
CoV nspl. For example, the motif K109-R110-L111 in MHV 
p28 was identified by Chen et al. as a potential cyclin-binding 
motif (8), and SARS-CoV nspl lacks residues corresponding 
to R110 and Llll. In addition, Chen et al. identified residues 
30 to 33 (S/NPER) of p28 as a potential site for phosphoryla¬ 
tion by cyclin-dependent kinases (8). These residues occur in 
an N-terminal 45-residue segment of p28 that appears not to 
be homologous to SARS-CoV nspl. The propensity to induce 
cell cycle arrest may therefore be unique to MHV p28, or 
possibly to the group 2a p28 proteins in general, and it might 
not be shared by SARS-CoV nspl even if it turned out that 
these proteins all share a similar fold. 

In other comparisons, no significant sequence identity be¬ 
tween SARS-CoV nspl and the nspl (p9) proteins of the 
group 1 CoVs could be detected. These results are consistent 
with the analysis by Snijder et al. (59), who described nspl as 
a specific marker of group 2 CoVs. The p9 proteins of group 1 
CoVs most likely differ from those of group 2 CoVs in both 
structure and function. 

The MHV1 p28 protein was subjected to a mutagenesis 
study by Brockway et al. (7), who generated single-amino- 
acid replacements and truncated versions of this protein and 
studied their impact on viral replication in cultured cells. 
Among the mutations found to affect viral replication, only 
some occur in residues conserved between MHV1 p28 and 
SARS-CoV nspl (Fig. 7c). Deletion of the entire p28 pro¬ 
tein or of the polypeptide segment from residue 87 to 164 of 
MHV p28 prevented the virus from productively infecting 
cultured cells (7). If MHV p28 and SARS-CoV nspl did 
indeed share a similar fold, the latter construct would lack 
most of the globular domain. In contrast, the carboxy-ter¬ 
minal half of MHV p28 (residues 124 to 241) has been 
shown to be dispensable for replication in culture, but it is 
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FIG. 7. (a) Amino acid sequence of nspl, with solvent-exposed residues highlighted in green. A residue is considered to be exposed if at least 
one atom of its side chain has more than 50% surface accessibility to the solvent. For glycines, the CO and H N exposure is considered, (b) Surface 
views of nspl in a space-filling representation. In the surface view shown on the left, the structure has the same orientation as in Fig. 2b. Some 
of the surface-exposed side chains discussed in the text are identified with the one-letter amino acid code and the residue number. Color code: gray, 
hydrophobic and polar residues; red, negatively charged; blue, positively charged, (c) Sequence alignment between SARS-CoV nspl and MHV 
nspl identified with the FFAS server. Identical residues are shown in red. Arrows indicate single-amino-acid replacements in MHV p28 that were 
generated and studied by Brockway et al. (7). Mutations that are detrimental to the viral replication are identified by boldface, while those that 
are not detrimental are in italic. Residues removed in the truncated variant protein MHV1 nsplAC are shown in lowercase (see text). Residues 
in |3-strands and in helical secondary structures are underlined with solid and dashed lines, respectively. 


important for efficient proteolytic cleavage of the protein 
and for optimal viral replication. If MHV p28 were to con¬ 
tain regular secondary structures similar to those of SARS- 
CoV nspl, removal of the polypeptide segment from residue 
124 to 241 would correspond to the loss of the strands (33, 
(34, (35, and (36, as well as of the flexibly disordered C- 
terminal tail, which would appear to entail a considerable 
disruption of the protein fold. The following considerations 
might help to resolve the apparent ensuing discrepancies. 
First, the increased flexibility and lack of a globular fold in 
the C-terminal region of the protein may ensure accessibility 
of the protease recognition site between nspl and nsp2 but 
may not be directly involved with the activity exerted by the 
protein. Second, it appears that the strands (31 and (32 and 
the helix al might provide for a sufficiently stable fold to 


maintain the so-far-unidentified biological activity, in par¬ 
ticular if one assumes that the additional N-terminal 45- 
residue segment of MHV p28, which is not homologous to 
SARS-CoV nspl, could participate in a globular fold and 
help to stabilize the shortened protein. 

In conclusion, this paper shows that the SARS-CoV protein 
nspl, which is encoded at the 5' terminus of the genome, forms 
a previously unknown complex (3-barrel fold with several unique 
structural features. We hypothesize that the uniqueness of the 
irregular (3-barrel fold may be related to a so-far-unknown, 
unique biological function of nspl. The definition of the globular 
region of nspl and the identification of residues on the molecular 
surface likely to contribute to mRNA degradation activity may 
provide a platform for continued research on the role of this 
protein in SARS-CoV and in other coronaviruses. 
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